Simultaneous Nearest Neighbor Search

نویسندگان

  • Piotr Indyk
  • Robert D. Kleinberg
  • Sepideh Mahabadi
  • Yang Yuan
چکیده

Motivated by applications in computer vision and databases, we introduce and study the Simultaneous Nearest Neighbor Search (SNN) problem. Given a set of data points, the goal of SNN is to design a data structure that, given a collection of queries, finds a collection of close points that are “compatible” with each other. Formally, we are given k query points Q = q1, · · · , qk, and a compatibility graph G with vertices in Q, and the goal is to return data points p1, · · · , pk that minimize (i) the weighted sum of the distances from qi to pi and (ii) the weighted sum, over all edges (i, j) in the compatibility graph G, of the distances between pi and pj . The problem has several applications in computer vision and databases, where one wants to return a set of consistent answers to multiple related queries. Furthermore, it generalizes several well-studied computational problems, including Nearest Neighbor Search, Aggregate Nearest Neighbor Search and the 0-extension problem. In this paper we propose and analyze the following general two-step method for designing efficient data structures for SNN. In the first step, for each query point qi we find its (approximate) nearest neighbor point p̂i; this can be done efficiently using existing approximate nearest neighbor structures. In the second step, we solve an off-line optimization problem over sets q1, · · · , qk and p̂1, · · · , p̂k; this can be done efficiently given that k is much smaller than n. Even though p̂1, · · · , p̂k might not constitute the optimal answers to queries q1, · · · , qk, we show that, for the unweighted case, the resulting algorithm satisfies a O(log k/ log log k)-approximation guarantee. Furthermore, we show that the approximation factor can be in fact reduced to a constant for compatibility graphs frequently occurring in practice, e.g., 2D grids, 3D grids or planar graphs. Finally, we validate our theoretical results by preliminary experiments. In particular, we show that the “empirical approximation factor” provided by the above approach is very close to 1.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

Nearest neighbor classi®er: Simultaneous editing and feature selection

Nearest neighbor classi®ers demand signi®cant computational resources (time and memory). Editing of the reference set and feature selection are two di€erent approaches to this problem. Here we encode the two approaches within the same genetic algorithm (GA) and simultaneously select features and reference cases. Two data sets were used: the SATIMAGE data and a generated data set. The GA was fou...

متن کامل

A Parallel Algorithms on Nearest Neighbor Search

The (k-)nearest neighbor searching has very high computational costs. The algorithms presented for nearest neighbor search in high dimensional spaces have have suffered from curse of dimensionality, which affects either runtime or storage requirements of the algorithms terribly. Parallelization of nearest neighbor search is a suitable solution for decreasing the workload caused by nearest neigh...

متن کامل

Online Learning of Binary Feature Indexing for Real-Time SLAM Relocalization

In this paper, we propose an indexing method for approximate nearest neighbor search of binary features. Being different from the popular Locality Sensitive Hashing (LSH), the proposed method construct the hash keys by an online learning process instead of pure randomness. In the learning process, the hash keys are constructed with the aim of obtaining uniform hash buckets and high collision ra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016